Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Azure Blob Storage SDK to v12 #2573

Draft
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

pamelafox
Copy link
Contributor

Fixes #2566

This PR upgrades the azure-storage-blob SDK from v2 to v12, which involved a lot of interface changes. I followed the migration guide @ https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/storage/azure-storage-blob/migration_guide.md and was able to get all the previous functionality working, at least according to the tests.

For ease of testing, I added a simple example app and a devcontainer.azure.json which brings in the Azurite local emulator. That means you can open this repo inside a Codespace or Dev Container with that configuration, and Azure Blob Storage will be running for you.

@pamelafox
Copy link
Contributor Author

There's an admin test that is failing in CI that isn't failing locally, so I'm going to put some debugging in in the next commits to try to figure out what's happening.

@samuelhwilliams
Copy link
Contributor

FYI, it is failing for me locally too 👀 Not that it helps much ... but I can possibly investigate a bit myself and see if I find anything.

@samuelhwilliams
Copy link
Contributor

If it helps:

/Users/sam/work/personal/flask-admin/flask_admin/contrib/fileadmin/azure.py(220)read_file()
-> blob = self._container_client.get_blob_client(path).download_blob()
(Pdb) ll
215         def read_file(self, path):
216             path = self._ensure_blob_path(path)
217             if path is None:
218                 raise ValueError("No path provided")
219             breakpoint()
220  ->         blob = self._container_client.get_blob_client(path).download_blob()
221             return blob.readall()
(Pdb) pp path
'dummy.txt'
(Pdb) n
azure.core.exceptions.DeserializationError: Unable to deserialize response data. Data: undefined, bytearray
> /Users/sam/work/personal/flask-admin/flask_admin/contrib/fileadmin/azure.py(220)read_file()
-> blob = self._container_client.get_blob_client(path).download_blob()

@pamelafox
Copy link
Contributor Author

Thanks! Might be a statefulness thing, I'll start a fresh env.

@LeXofLeviafan
Copy link
Contributor

...Wouldn't it be better to include actual information in that error variable? I.e. make it an optional string (containing the error message if an error occurred).

The check would work the same, and the error could be logged properly before redirect (...as it probably should be, anyway.)

@pamelafox
Copy link
Contributor Author

@samuelhwilliams It was an error due to the tests using an older Azure storage emulator, I've updated them to the official Microsoft hosted emulator now, and they're passing fine.
@LeXofLeviafan I agree that it would be nice if the errors were easier to debug. They should probably be logged as either warning or error, depending on whether they're likely user errors or server errors. That would be better in a different PR though, I think. I'm trying to only affect the Azure module for this one.

@@ -0,0 +1,14 @@
// For format details, see https://aka.ms/devcontainer.json.
{
"name": "flask-admin (Python + Azurite)",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, I could add Postgres and Mongo to this dev container too, to have a single container that can run all the tests. It should be fairly easy given tests.yaml has the services setup, just copying that to docker-compose.yaml.

from azure.storage.blob import BlobServiceClient
from azure.storage.blob import generate_blob_sas
except ImportError as e:
raise Exception(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the handling of the importerror to make mypy happier. I looked at other optional modules and they all seemed to handle importerrors a bit differently. I found one that did it this way. This seems fine since you should import the module unless you're using it.

@pamelafox
Copy link
Contributor Author

Hm, it appears I still don't have an approach for handling unimportable Azure modules that makes mypy happy. Let me know if you have any thoughts about the best practice for importing extra modules in tests (and skipping tests if they don't exist). I'll take another look Monday otherwise.

@samuelhwilliams
Copy link
Contributor

I don't think we need to support tests running without azure installed - I'd probably be fine with you removing the try/except around that test import.

@pamelafox
Copy link
Contributor Author

Okay, I've made it so that the tests assume you've got azure-blob-storage installed. I've also made the tests devcontainer bring in Postgres and Mongo too, so I was able to get all the tests passing in my dev container environment, without additional setup.

Comment on lines 56 to 57
self._container_name = container_name
self._connection_string = connection_string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the changes I made on the s3 admin side when bringing it up to date was to have __init__ take a client instance rather than parameters that get passed the client.

Do you think we should do something similar here and accept an instance of BlobServiceClient, or is it still fine to just use the connection string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I think thats nice, as I personally don't typically use connection strings (this was my first time using from_connection_string), so that gives developers more flexibility as to how they connect. I can make that change. That'd be breaking, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be, but this is scheduled to go out for the v2 release where we're making a bunch of breaking changes, so I'm ok with it.

If you'd be happy to, feel free :) 🙏

@samuelhwilliams
Copy link
Contributor

I'm not sure the hstore extension is being set up correctly for the devcontainer. At least, I just opened vscode for this PR and ran the tests, and it complained that hstore didn't exist. After manually creating it and re-running the tests seem good. Is that something you could check?

@samuelhwilliams
Copy link
Contributor

samuelhwilliams commented Nov 25, 2024

I also think I'm getting one error in the devcontainer that feels odd: FAILED flask_admin/tests/fileadmin/test_fileadmin.py::TestLocalFileAdmin::test_file_admin - assert 200 == 302 🤔

But clearly it passes in CI, so...

@pamelafox
Copy link
Contributor Author

@samuelhwilliams I put the hstore setup in the postAttachCommand, but that doesnt' run until really late in the process, so it's possible you ran the tests before that had been run. I can make it more robust by instead bringing in a Dockerfile for the postgis and putting the CREATE EXTENSIOn in there.

@samuelhwilliams Do you get that error when there are no changes locally? Sometimes errors happen with partially aborted tests due to dangling changes to dummy files.
And is that in Codespaces or VS Code Dev Containers locally?

@samuelhwilliams
Copy link
Contributor

samuelhwilliams commented Nov 25, 2024

Apologies - both of these problems were on my end, my checkout of your latest changes failed so I still had an old version of the PR cached.

After resetting that, unfortunately I get a new error that looks unrelated - not able to install requirements/dev.txt, presumably due to some pinned package incompatibilities. I think that probably deserves to be out-of-scope for fixing in this PR though... After manually changing devcontainer to install requirements/tests.txt instead, which does work, there's still some test failures due to an apparent incompatibility somewhere related to Flask 3.1.0's partitioned cookies.

But I think, yeah, neither of these things are strictly this PR's problem...

@samuelhwilliams
Copy link
Contributor

samuelhwilliams commented Nov 25, 2024

Running the example app locally, after uploading a file, when I click the file to trigger a download I'm getting an error:

image

Can you tell if this a fundamental issue or just related to the example app? Feels like it would be good to have the example app working fully in this regard.

Edit: Also one small other thing about the example app - most of the others have a landing page on / that has a link pointing to Flask-Admin. If you could add one of those it would be great 🙏

@pamelafox
Copy link
Contributor Author

@samuelhwilliams I ran into both install issues. I've been commenting out numpy and manually installing Flask==3.0.0. I wasn't sure whether to tackle those in the PR or not.

@pamelafox
Copy link
Contributor Author

@samuelhwilliams I'll check on the download error, that sounds like a flow I didn't do. (And perhaps isnt covered by tests, given they're passing). And add a link.

@samuelhwilliams
Copy link
Contributor

@samuelhwilliams I ran into both install issues. I've been commenting out numpy and manually installing Flask==3.0.0. I wasn't sure whether to tackle those in the PR or not.

Ah ok that's reassuring for me at least. I think it's fine to leave it for a separate PR; I'll raise an issue on it...

@samuelhwilliams I'll check on the download error, that sounds like a flow I didn't do. (And perhaps isnt covered by tests, given they're passing). And add a link.

Brill, thanks!

@pamelafox
Copy link
Contributor Author

The error with download_file looks like this Azurite issue:
Azure/Azurite#656
So I'm actually going to first see if the issue happens with a prod Azure account, and assuming it doesn't, I can shift focus to configuring Azurite correctly (as described in the issue).
That bit of the code was the trickiest to upgrade, so it's possible there's a lingering issue there.

@pamelafox
Copy link
Contributor Author

Update:

  • The download file code was originally generating a SAS URL and redirecting to that URL. To generate a SAS URL securely, you need to use keyless authentication, which is tricky to set up with Azurite. However, I realized we could just use flask's send_file after downloading the blob, since users dont need the URL, they just need the data, so I changed the code to use that. That's what I use for downloading blobs in my work projects.
  • I changed the constructor to take in a BlobServiceClient, and put examples of two ways of making the client in the Azure example (one for prod with keyless auth, one for local with connection string).

@pamelafox
Copy link
Contributor Author

FYI: there are no type annotations for AzureFileAdmin and S3FileAdmin currently. When we add them in, we get a mypy error about "multiple values for keyword storage". I think it's confused by the args/kwargs around it. So I have left off type annotations on AzureFileAdmin for now, though I added them to the storage class in a few places.

@samuelhwilliams
Copy link
Contributor

samuelhwilliams commented Dec 1, 2024

Just testing and still seeing errors on download that you said are related to Azure/Azurite#656, and also managed to upload a JPG and get an error from azure when trying to rename it. 🤔

If this is still WIP can we flip it to draft, as I'm not sure whether you want me to re-review it yet 👀 (or can you @ me when it's ready so that I know I need to look 🙏)

@pamelafox
Copy link
Contributor Author

pamelafox commented Dec 1, 2024 via email

@samuelhwilliams
Copy link
Contributor

samuelhwilliams commented Dec 1, 2024

Using AZURE_STORAGE_CONNECTION_STRING=DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://127.0.0.1:10000/devstoreaccount1;

(I don't have a real azure account so might struggle to test there - let me know if I really need to get one 👀)

@pamelafox
Copy link
Contributor Author

Hm, I haven't been able to replicate yet, on this branch in GitHub Codespaces with the dev container configuration.

One thing I could do early next week is to ask my colleagues to do a bug bash on this branch, with both local and prod, to see if they encounter any issues (since we all have Azure accounts). That'd be a fun excuse to intro them to flask-admin anyway.

Here's my successful rename:

Screenshot 2024-12-01 at 3 03 06 PM

And my logs - which aren't super useful except that they show which URL is being requested. From what I've read, Azurite "start_copy_from_url" should work as long as the server is the same (http://127.0.0.1:10000 in this case).

INFO:azure.core.pipeline.policies.http_logging_policy:Request URL: 'http://127.0.0.1:10000/devstoreaccount1/fileadmin-tests?restype=REDACTED&comp=REDACTED&prefix=REDACTED'
Request method: 'GET'
Request headers:
    'x-ms-version': 'REDACTED'
    'Accept': 'application/xml'
    'User-Agent': 'azsdk-python-storage-blob/12.24.0 Python/3.12.7 (Linux-6.5.0-1025-azure-x86_64-with-glibc2.36)'
    'x-ms-date': 'REDACTED'
    'x-ms-client-request-id': '29312128-b038-11ef-8796-0242ac120002'
    'Authorization': 'REDACTED'
No body was attached to the request
INFO:azure.core.pipeline.policies.http_logging_policy:Response status: 200
Response headers:
    'Server': 'Azurite-Blob/3.33.0'
    'content-type': 'application/xml'
    'x-ms-client-request-id': '29312128-b038-11ef-8796-0242ac120002'
    'x-ms-request-id': '86f5013e-ee43-4231-aa18-c934836d4e0d'
    'x-ms-version': 'REDACTED'
    'date': 'Sun, 01 Dec 2024 23:01:12 GMT'
    'Connection': 'keep-alive'
    'Keep-Alive': 'REDACTED'
    'Transfer-Encoding': 'chunked'
INFO:azure.core.pipeline.policies.http_logging_policy:Request URL: 'http://127.0.0.1:10000/devstoreaccount1/fileadmin-tests?restype=REDACTED&comp=REDACTED&prefix=REDACTED'
Request method: 'GET'
Request headers:
    'x-ms-version': 'REDACTED'
    'Accept': 'application/xml'
    'User-Agent': 'azsdk-python-storage-blob/12.24.0 Python/3.12.7 (Linux-6.5.0-1025-azure-x86_64-with-glibc2.36)'
    'x-ms-date': 'REDACTED'
    'x-ms-client-request-id': '2932fd7c-b038-11ef-8796-0242ac120002'
    'Authorization': 'REDACTED'
No body was attached to the request
INFO:azure.core.pipeline.policies.http_logging_policy:Response status: 200
Response headers:
    'Server': 'Azurite-Blob/3.33.0'
    'content-type': 'application/xml'
    'x-ms-client-request-id': '2932fd7c-b038-11ef-8796-0242ac120002'
    'x-ms-request-id': '2bf381a6-acfa-4fd0-98a6-dbd53aa6518a'
    'x-ms-version': 'REDACTED'
    'date': 'Sun, 01 Dec 2024 23:01:12 GMT'
    'Connection': 'keep-alive'
    'Keep-Alive': 'REDACTED'
    'Transfer-Encoding': 'chunked'
INFO:azure.core.pipeline.policies.http_logging_policy:Request URL: 'http://127.0.0.1:10000/devstoreaccount1/fileadmin-tests?restype=REDACTED&comp=REDACTED&prefix=REDACTED'
Request method: 'GET'
Request headers:
    'x-ms-version': 'REDACTED'
    'Accept': 'application/xml'
    'User-Agent': 'azsdk-python-storage-blob/12.24.0 Python/3.12.7 (Linux-6.5.0-1025-azure-x86_64-with-glibc2.36)'
    'x-ms-date': 'REDACTED'
    'x-ms-client-request-id': '2935dd08-b038-11ef-8796-0242ac120002'
    'Authorization': 'REDACTED'
No body was attached to the request
INFO:azure.core.pipeline.policies.http_logging_policy:Response status: 200
Response headers:
    'Server': 'Azurite-Blob/3.33.0'
    'content-type': 'application/xml'
    'x-ms-client-request-id': '2935dd08-b038-11ef-8796-0242ac120002'
    'x-ms-request-id': '38197744-d851-4f7b-80c1-0925836a038b'
    'x-ms-version': 'REDACTED'
    'date': 'Sun, 01 Dec 2024 23:01:12 GMT'
    'Connection': 'keep-alive'
    'Keep-Alive': 'REDACTED'
    'Transfer-Encoding': 'chunked'
INFO:azure.core.pipeline.policies.http_logging_policy:Request URL: 'http://127.0.0.1:10000/devstoreaccount1/fileadmin-tests/test.jpg'
Request method: 'PUT'
Request headers:
    'x-ms-requires-sync': 'REDACTED'
    'x-ms-copy-source': 'REDACTED'
    'x-ms-version': 'REDACTED'
    'Accept': 'application/xml'
    'User-Agent': 'azsdk-python-storage-blob/12.24.0 Python/3.12.7 (Linux-6.5.0-1025-azure-x86_64-with-glibc2.36)'
    'x-ms-date': 'REDACTED'
    'x-ms-client-request-id': '2938575e-b038-11ef-8796-0242ac120002'
    'Authorization': 'REDACTED'
No body was attached to the request
INFO:azure.core.pipeline.policies.http_logging_policy:Response status: 202
Response headers:
    'Server': 'Azurite-Blob/3.33.0'
    'etag': '"0x1DF02BBD3F33C30"'
    'last-modified': 'Sun, 01 Dec 2024 23:01:12 GMT'
    'x-ms-client-request-id': '2938575e-b038-11ef-8796-0242ac120002'
    'x-ms-request-id': 'a40b52e4-32ed-44bf-8fef-9b28c9662e2e'
    'x-ms-version': 'REDACTED'
    'date': 'Sun, 01 Dec 2024 23:01:12 GMT'
    'x-ms-copy-id': 'REDACTED'
    'x-ms-copy-status': 'REDACTED'
    'Connection': 'keep-alive'
    'Keep-Alive': 'REDACTED'
    'Content-Length': '0'
INFO:azure.core.pipeline.policies.http_logging_policy:Request URL: 'http://127.0.0.1:10000/devstoreaccount1/fileadmin-tests/2QMXPCB2-102372.JPG'
Request method: 'DELETE'
Request headers:
    'x-ms-version': 'REDACTED'
    'Accept': 'application/xml'
    'User-Agent': 'azsdk-python-storage-blob/12.24.0 Python/3.12.7 (Linux-6.5.0-1025-azure-x86_64-with-glibc2.36)'
    'x-ms-date': 'REDACTED'
    'x-ms-client-request-id': '293ab90e-b038-11ef-8796-0242ac120002'
    'Authorization': 'REDACTED'
No body was attached to the request
INFO:azure.core.pipeline.policies.http_logging_policy:Response status: 202
Response headers:
    'Server': 'Azurite-Blob/3.33.0'
    'x-ms-client-request-id': '293ab90e-b038-11ef-8796-0242ac120002'
    'x-ms-request-id': '360678d6-3589-463d-b133-bbd55c354b83'
    'x-ms-version': 'REDACTED'
    'date': 'Sun, 01 Dec 2024 23:01:12 GMT'
    'x-ms-delete-type-permanent': 'REDACTED'
    'Connection': 'keep-alive'
    'Keep-Alive': 'REDACTED'
    'Content-Length': '0'
INFO:werkzeug:127.0.0.1 - - [01/Dec/2024 23:01:12] "POST /admin/azurefileadmin/rename/?path=2QMXPCB2-102372.JPG HTTP/1.1" 302 -
INFO:azure.core.pipeline.policies.http_logging_policy:Request URL: 'http://127.0.0.1:10000/devstoreaccount1/fileadmin-tests?restype=REDACTED&comp=REDACTED&prefix=REDACTED'
Request method: 'GET'
Request headers:
    'x-ms-version': 'REDACTED'
    'Accept': 'application/xml'
    'User-Agent': 'azsdk-python-storage-blob/12.24.0 Python/3.12.7 (Linux-6.5.0-1025-azure-x86_64-with-glibc2.36)'
    'x-ms-date': 'REDACTED'
    'x-ms-client-request-id': '295d70de-b038-11ef-a4ed-0242ac120002'
    'Authorization': 'REDACTED'
No body was attached to the request
INFO:azure.core.pipeline.policies.http_logging_policy:Response status: 200
Response headers:
    'Server': 'Azurite-Blob/3.33.0'
    'content-type': 'application/xml'
    'x-ms-client-request-id': '295d70de-b038-11ef-a4ed-0242ac120002'
    'x-ms-request-id': '5beb3d5b-a4b6-4fb1-8d88-0c673fc83806'
    'x-ms-version': 'REDACTED'
    'date': 'Sun, 01 Dec 2024 23:01:13 GMT'
    'Connection': 'keep-alive'
    'Keep-Alive': 'REDACTED'
    'Transfer-Encoding': 'chunked'

@pamelafox
Copy link
Contributor Author

I'm bug bash'ing this branch with some colleagues tomorrow, so hopefully we'll replicate that issue.

if not blob.properties or not blob.properties.has_key("content_settings"):
raise ValueError("Blob has no properties")
mime_type = blob.properties["content_settings"]["content_type"]
blob_file = io.BytesIO()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the blob_file should be closed here. It will delete the buffer. flask.send_file doesn't seem to close it. It will be closed when the variable is GC'ed but if you're getting lots of files it could consume a lot of memory

@pamelafox
Copy link
Contributor Author

We just did a little bug bash. Possible issues:

  • Directory name with "**" failed: Should that succeed? We think probably not?
  • Directory name with " " succeeded: Should that actually fail?
  • The date of new folders is 1970-01-01 00:00:00, should probably be actual date
  • Dates of files are in UTC/GMT, is that expected?
  • Memory leak with send_file (see comment)
  • Connection error when using a connection string with a prod storage account, versus keyless auth (which works fine).

Issues that are probably Codespace-related and shouldn't happen outside of Codespaces:

  • HTTP 413 from nginx when uploading files > 100 MB
  • 404 when clicking "Admin" link, due to :5000 at end of URL

Issues outside of this PR's scope:

  • Anthony thinks the UI would be clearer if ".." was at top of the UI in directory view. That's not an issue for this PR.

I'll move the PR to WIP while I triage those issues.

We did not replicate the rename-file issue with Azurite, all attendees were able to rename.

@pamelafox pamelafox marked this pull request as draft December 3, 2024 22:29
@samuelhwilliams
Copy link
Contributor

I can still reproduce an error on file renaming that looks like this

image

From a bit of playing around, it seems limited to large files (4MB+ or so) rather than JPG or a specific filetype. Can you reproduce with that?

@pamelafox
Copy link
Contributor Author

I tried a larger file, no luck, but can you attach your large file and also tell me what you tried renaming it to? In theory, I should be able to replicate, given we're in a containerized environment. If I still fail, then I'll DM next week on Discord and see if we can hop on a screen share or some thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade azure-storage-blob to >12
4 participants